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Abstract 

Wc consider the estimation of the value of a hnear functional of the slope parameter in 
functional linear regression, where scalar responses are modeled in dependence of random 
functions. In Johannes and Schenk [2010] it has been shown that a plug-in estimator based 
on dimension reduction and additional thresholding can attain minimax optimal rates of 
convergence up to a constant. However, this estimation procedure requires an optimal 
choice of a tuning parameter with regard to certain characteristics of the slope function 
and the covariance operator associated with the functional regressor. As these are unknown 
in practice, we investigate a fully data-driven choice of the tuning parameter based on a 
combination of model selection and Lcpski's method, which is inspired by the recent work 
of Goldenshluger and Lepski [2011]. The tuning parameter is selected as the minimizer 
of a stochastic penalized contrast function imitating Lepski's method among a random 
collection of admissible values. We show that this adaptive procedure attains the lower 
bound for the minimax risk up to a logarithmic factor over a wide range of classes of slope 
functions and covariance operators. In particular, our theory covers point-wise estimation 
as well as the estimation of local averages of the slope parameter. 
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1 Introduction 

The functional linear model with scalar response describes the relationship between a real 
random variable Y and the variation of a functional regressor X. Usually, the random function 
X is assumed to be square integrable or more generally to take its values in a separable Hilbert 
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space EI with the inner product (•, •)h and associated norm ||-||e- For convenient notations we 
assume that the regressor X is centered in the sense that for all /i G HI the real valued random 
variable {X, /i)e has mean zero. The linear relationship between Y and X is expressed by the 
equation 



with the unknown slope parameter (p ^M. and a real-valued, centered and standardized error 
term e. The objective of this paper is the fully data-driven estimation of the value of a known 
linear functional of the slope (p based on an independent and identically distributed (i.i.d.) 
sample of {Y^X) of size n. 

The estimation of the value of a linear functional offers a general framework for natu- 
rally arising related estimation problems, such as estimating the value of (/> - or of one of its 
derivatives - at a given point or estimating the average of (p over a subinterval of its domain. 

There is extensive literature available on the topic of non-parametric estimation of the 
value of a linear functional from Gaussian white noise observations (in case of direct obser- 
vations see Speckman [1979], Li [1982] or Ibragimov and Has'minskii [1984], while in case of 
indirect observations we refer to Donoho and Low [1992], Donoho [1994] or Goldenshluger 
and Pereverzev [2000] and references therein). In the situation of a functional linear model 
as considered in (1.1), which does in general not lead to Gaussian white noise observations, 
Johannes and Schenk [2010] have investigated the minimax optimal performance of a plug-in 
estimator for the value of a linear functional i evaluated at (p. For this purpose the slope <p is 
replaced in l[<p) by a suitable estimator <pm^ depending on a tuning parameter m* G N. How- 
ever their choice of the tuning parameter is not data-driven. In the present paper we develop 
a data-driven selection procedure which features comparable minimax-optimal properties. 

The non-parametric estimation of the slope function (p has been an issue of growing interest 
in the recent literature and a variety of such estimators have been studied. For example, Bosq 
[2000], Cardot et al. [2007] or Miiller and Stadtmiiller [2005] analyze a functional principal 
components regression, while a penalized least squares approach combined with projection onto 
some basis (such as splines) is examined in Ramsay and Dalzell [1991], Eilers and Marx [1996], 
Cardot et al. [2003], Hall and Horowitz [2007] or Crambes et al. [2009]. Cardot and Johannes 
[2010] investigate a linear Galerkin approach coming from the inverse problem community (c.f. 
Efromovich and Koltchinskii [2001] and Hoffmann and Reifi [2008]). The resulting thresholded 
projection estimator c/im* is used by Johannes and Schenk [2010] in their plug-in estimation 
procedure £m* := ^{<Pm^) for the value £{(j)) of a linear functional evaluated at (p. 

It has been shown in Johannes and Schenk [2010] that the attainable rate of convergence of 
the plug-in estimator is basically determined by the a priori conditions on the solution (j) and 
the covariance operator F associated with the regressor X (defined below). These conditions 
are expressed in the form cp ^ T and F S ^, for suitably chosen classes C EI and Q] we 
postpone their formal introduction along with their interpretation to Section 2. Moreover, the 
accuracy of any estimator I of the value £(0) has been assessed by its maximal mean squared 
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error with respect to these classes, that is 
n%F,g] := supsupE|£-£((/.)|2. 

The main purpose of Johannes and Schenk [2010] has been to derive a lower bound 

where the infimum is taken over all estimators i, and to prove that the estimator satisfies 

n^[im*„;J^,g] ■ni[n-^;J^,g], with O < C < oo, 

for a variety of classes J- and g. In other words it has been shown that Tll[n~^ ; , g] is 
the minimax-optimal rate attained by the estimator ^m*- The optimal performance of the 
estimator depends crucially on the choice m* of the tuning parameter, which in turn, relies 
strongly on a priori knowledge of the sets and g. However, this information is widely 
inaccessible in practice. 

The aim of the present paper consists in proposing a fully data-driven selection procedure 
for the tuning parameter. Our selection method combines model selection (c.f. Barron et al. 
[1999] and its detailed discussion in Massart [2007]) and Lepski's method (c.f. Lepski [1990] 
and its recent review in Mathe [2006]). It is inspired by the recent work of Goldenshluger 
and Lepski [2011] who consider data-driven bandwidth selection in kernel density estimation. 
We choose the appropriate tuning parameter m as the minimizer of a stochastic penalized 
contrast function imitating Lepski's method among a random collection of admissible values. 
Furthermore, we show that the maximal risk of the resulting estimator £^ satisfies 

n^[ifn;J^,g] ^ C•7^^[(l + logn)n~^ J-,g] for < C7 < oo, 

for a variety of classes and g. The upper bound in the last display features a logarithmic 
factor when compared to the minimax rate of convergence TZl[n~^;T, g] which possibly results 
in a deterioration of the rate. Therefore, the completely data-driven estimator is optimal or 
nearly optimal in the minimax sense simultaneously over a variety of both solution sets J- and 
classes of operators g. We call such estimation procedures adaptive. The appearance of the 
logarithmic factor within the rate is a known fact in the context of local estimation (c.f. Laurent 
et al. [2008] who consider model selection given direct Gaussian observations). Brown and Low 
[1996] show that it is unavoidable in the context of non-parametric Gaussian regression and, 
hence it is widely considered as an acceptable price for adaptation. This factor is also present 
in the recent work of Goldenshluger and Pereverzev [2000] where Lepski's method is applied 
in the presence of indirect Gaussian observations. In contrast to this situation the operator is 
not known in advance in functional linear regression and hence a straightforward application 
of their results is not obvious. We will show that our proposed data-driven estimation method 
attains the minimax-rates up to a logarithmic factor for a variety of a classes of both slope 
functions and covariance operators. 
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The paper is organized as follows: in Section 2 we introduce the adaptive estimation 
procedure and review the available minimax theory as presented in Johannes and Schenk 
[2010]. In Section 3 we present the key arguments of the proof of an upper risk bound for the 
adaptive estimator, while more technical aspects of the proof are deferred to the Appendix. 
We discuss the examples of point-wise and local average estimation in Section 4. 

2 Methodology and review 

We suppose that the regressor X has a finite second moment, i.e., E||X|||[ < cxo, and that 
X is uncorrelated to the random error e in the sense that E[e(X, = for all /i € H, as 
usually assumed in this context, see for example Bosq [2000], Cardot et al. [2003] or Cardot 
et al. [2007]. Multiplying both sides in (1.1) by (X, /i)e and taking the expectation leads to 
the normal equation 

(5,/i)h :=]E[y(X,/i)H] =E[(<^,X)H(X,/i)e] =: (r</.,/i)H, V/i G M, (2.1) 

where g belongs to H and F denotes the covariance operator associated with the random 
function X. In what follows we assume that there exists a unique solution ^ G H of equation 
(2.1), i.e., that T is strictly positive and that its range contains g (for a detailed discussion we 
refer to Cardot et al. [2003]). Obviously, these conditions are sufficient for the identification of 
the value Since the estimation of (j) involves an inversion of the covariance operator F it is 
called an inverse problem. Moreover, due to the finite second moment of the regressor X, the 
associated covariance operator F is nuclear, i.e., its trace is finite. Therefore, the reconstruction 
of (j) leads to an ill-posed inverse problem (with the additional difficulty that F is unknown and 
has to be estimated). In the following we assume that the joint distribution of the regressor and 
error term is Gaussian, more precisely, we suppose that for any finite set {hi , . . . , /jfc-i} C EI the 
vector {{X, /ii)h • • • ) hk-i)m^ £) follows a fc-dimensional multivariate normal distribution. 

Remark 2.1. The assumption of Gaussianity is not essential for the proof of our main result. 
This assumption on the distributions of the error and the regressor is only used to prove the 
bounds given in Lemma C.2. Analogues of the results can be shown at the cost of longer proofs 
under appropriately chosen moment conditions. □ 

2.1 Adaptive Estimation Procedure 

Introduction of the estimator. In order to derive an estimator for the unknown slope 
function (p we follow the presentation of Johannes and Schenk [2010] and base our reconstruc- 
tion on the development of (f) in an arbitrary orthonormal basis. Here and subsequently, we 
fix a pre-specified orthonormal basis {ipj^JL^ of EI which does in general not correspond to 
the eigenfunctions of the operator F defined in (2.1). We require in the following that the 
slope function (/> belongs to a function class J- containing {V^jj^i and, moreover that J- is 
included in the domain of the linear functional For technical reasons and without loss of 
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generality we assume that = 1 which can always be ensured by reordering and rescaling, 

except for the trivial case i = 0. With respect to this basis, we consider for all /i G EI the 
development h = Yl'jLiifAj'^j where the sequence [h] := of generalized Fourier coef- 

ficients [h]j := {h,'4)j)m is square-summable, i.e., = "^ZfLA^Yj < Given a dimension 

parameter m € N we have the subspace - spanned by the basis functions - at our 

disposal and we call 0m G a Galerkin solution oi g = Vcj), if H^f — Tc/imllH ^ \\g — T^IIh 
for all h G Since V is strictly positive it is easily seen that the Galerkin solution (prn 

oi g = T(j) exists uniquely. Let us introduce for any function h the m-dimensional vector 
of coefficients [h]m := {[K\j)i^j^m and for the operator F the (?n x m)-dimensional matrix 
[F]m := ((V'i,r-(/'fe)H)is:j,fc^m- Then the Galerkin solution satisfies [F]m[(^m]m = [g\m- Since 
F is injective, the matrix [F]m is non-singular for all m ^ 1 and therefore the Galerkin solution 
4>m G Elm is uniquely determined by the vector of coefficients [(/'m]m = [r].m^[(7]m and [4>m]j = 
for j > m. In order to derive an estimator for the vector [i;^m]m) we replace the unknown 
quantities [g]m and [F]m by their empirical counterparts and apply additional thresholding. 
We observe that [F]^ = E[X]m[^]m and [g]rn = l^Y[X]rn, therefore, given an i.i.d. sample 
{{Yi, Xi)}^^i of {Y,X), it is natural to consider the estimators [^]rn '■= ^Yl'i=iYi[Xi]rn and 
flin := hYA=i[^i]in[^i]ln- Let US denote by ||[f]^^||s the spectral norm of [f]^, i.e., its 
largest eigenvalue, and define the estimator (pm G by means of the coefficients [(j)m]j = 
for j > m and 

, if [T]m is non-singular and ||[F]^"'^||s ^ n, 
otherwise. 

Observe that i{4>m) = {^{i^i), ■ ■ ■ , ■^(^m))[0m]m =: [^]m['?^m]m with the slight abuse of notations 
[£\rn '■= (Mi)i^js£m and generic elements := i{ipj)- In Johannes and Schenk [2010] it has 
been shown that the estimator im ■= i{4>m) with optimally chosen dimension parameter m 
can attain minimax-optimal rates of convergence. This choice involves certain characteristics 
of the slope (p and the covariance operator F which are unavailable in practice. In the next 
paragraph we introduce a fully data-driven selection method for the dimension parameter. 

Introduction of the adaptive estimation procedure. Our selection method is inspired 
by the recent work of Goldenshluger and Lepski [2011] and combines the techniques of model 
selection and Lepski's method. We determine the dimension parameter among a collection of 
admissible values by minimizing a penalized contrast function. To this end, we define for all 
n ^ 1 the value := maxjl ^ m ^ [n-^/^J : [i]m[^]rn ^ where [aj denotes as usual the 
integer part of a € M and introduce the random integer 

Mn := min{2 m ^ M^ || [r]„^ ||s([C[%) > n(l +logn)-i} - 1. (2.2) 

Furthermore, we define a stochastic penalty sequence p := {Pm)i<m<M 

Pm := 700(-X;^/ + 2[?]^„[f]-n?]J • max [i]l[f]-^[i],Al±^^. 
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The random integer M„ and the stochastic penalty p^ are used to define a contrast by 



For a subset yl C N and a sequence (am)m^i with minimal value in A we set arg min^g^jam,} := 
min{m : Um ^ Om'jVm' G A} and select the dimension parameter 



The estimator of is now given by and we will derive an upper bound for its risk below. 
By construction the choice of the dimension parameter and hence the estimator if^ rely only 
on the data and in particular not on the regularity assumptions on the slope and the operator 
which we formalize in the next section. 

2.2 Review of minimax theory 

We express our a priori knowledge about the unknown slope parameter and covariance operator 
in the form cj) £ J- and T £ Q. The class J- reflects information on the solution (p, e.g., its level 
of smoothness, whereas the assumption T £ Q typically results in conditions on the decay of the 
eigenvalues of the operator T. The following construction of the classes and Q will be flexible 
enough to characterize, in particular, differentiable or analytic slope functions and allows us 
to discuss both a polynomial and exponential decay of the covariance operator's eigenvalues. 

Assumptions and notations. With respect to the basis {'ipj}'^^ and given a strictly 
positive sequence of weights {wj)j^i, or w for short, we define the weighted norm ||-||^ by 
ll^llu> ■= fo'^ h £ M. Throughout the rest of the paper let /3 be a non-decreasing 

sequence of weights with /3i = 1 such that slope parameter cj) belongs to the ellipsoid 



In order to guarantee that J^'g is contained in the domain of the linear functional i and 



tends to zero nor that it is square summable. However, if it is square summable then H is the 
domain of i. Moreover, [i] coincides with the sequence of generalized Fourier coefficients of the 
representer of i given by Riesz's theorem. 

As usual in the context of ill-posed inverse problems, we link the mapping properties of 
the covariance operator F and the regularity conditions on (p. To this end, we consider the 
sequence {(J^ipjii^j))]^! ='■ Since F is nuclear, this sequence is summable and hence 

vanishes as j tends to infinity. In what follows we impose restrictions on the decay of this 
sequence. Let Q denote the set of all strictly positive nuclear operators defined on H. We 




m := arg min {k^ + Pml • 



(2.3) 



:= {/i G M : ||/i||^ ^ r} with radius r > 0. 
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suppose that there exists a strictly positive, summable sequence of weights 7 with 71 = 1 such 
that r belongs to the subset 

g^:={T€g: (i"'||/i||'2 ^ ||r/i||^ ^ ||;^||2^^ V/i G m} with d ^ 1 

where we understand here and subsequently arithmetic operations on a sequence of real num- 
bers component- wise, e.g., we write 7^ for (7j)j^i. Notice that for F G it follows that 
d~^^j ^ [T]jj ^ djj. Moreover, if A denotes its sequence of eigenvalues, then d~^^j ^ Xj ^ d'jj 
which justifies the condition Yl'jLi 7j < Let us summarize the previous conditions: 
Assumption 2.1. The sequences 1/(3 and 7 are monotonically decreasing with limit zero and 
A = 71 = 1 such that l^i^ < and Ylj^^i 7i < °o. 

Illustration. We illustrate the last assumption for typical choices of the sequences f3, 7 and 
[£]. Consider = \ and: 

(pp) = |j|2p, = |j|-2a ^-^^Yi p > 0, a > 1/2 and s > 1/2 - p; 
(pc) (3j = Ij'P*', 7j = exp(-|jp" + 1) with p > 0, a > and s > 1/2 - p; 
(ep) /3j = exp(|j|2p - 1), 7j = with p > 0, a > 1/2 and s G M; 

then Assumption 2.1 holds true in all cases. 

Minimax theory reviewed. Johannes and Schenk [2010] have derived a lower bound for 
the minimax risk ini ^TZ^ [£; TJ^jQ!^] and have shown that the proposed estimator £m can attain 
this lower bound up to constant provided that the dimension parameter is chosen appropriately. 
In order to formulate the minimax rate below let us define for m ^ 1 and x £ (0, 1] 

nl[x;r,,g'^] :=max| J] !fi,max(|li,x)f;ifil 

[j>r?i ^ ™ j=l J 

and ni[x;T;,g^] := miIl7^^[x; J^, g^]. 

With this notation the lower bound, when considering an i.i.d. sample of size n, is basically a 
multiple of [n- 1 ; , g^] . To be more precise, if we define m* := arg min^^;^ ^mi"- ^7] 
and if Assumption 2.1 and inf„>i min ( , J"" ) > are satisfied then there exists a constant 
C > depending only on the classes and o"^ such that we have for all n ^ 1 

On the other hand it is shown in Johannes and Schenk [2010] that 7^^ [n~^; ^^] provides 
up to a constant an upper bound for the maximal risk of the proposed estimator ^m*- More 
precisely, if we assume in addition sup^^.^ m^7m/3~^ < 00 then there exists a constant C > 
depending only on the classes and o"^ such that we have for all n ^ 1 

7^^[4^* ; J-^, g^] ^ C • ni [n-i; r^, g^]. 

Consequently the rate 7^^[n^^; JT, ^^] is optimal and £m* is minimax-optimal. 
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Illustration continued. For the configurations defined below Assumption 2.1 the estimator 
^m* with dimension parameter m* as given below is minimax optimal under the following condi- 
tions. The minimax optimal rate of convergence is determined by the orders of 7^^ [n~^ ; , Q!^] . 
Here and subsequently, we use for two strictly positive sequences {xn)n^i, (2/n)n>i the notation 
^ Vn, if {xn/yn)n^i IS bounded away both from zero and infinity. 

(pp) If p > 0, a > 1/2 and p + a ^ 3/2 then m* x n^/C^P+'^a) and if s > 1/2 - then 

'^_(2p+2.-l)/(2p+2a)^ ifs-0<l/2 

n~-'^logn, ifs — a = l/2 

n~^, if s - a > 1/2. 



(pe) If p > and a > 0, then m* x log(n(logn) p/'^)i/(2a) and if s > 1/2 — p, then 

>^ (iogn)-(2f+2-i)/(2-). 



fepj If p > 0, a > 1/2 and s G M then m* x log(n(log n)"'*/?')^/^^^) and 

n-i(logn)(2<^-2^+i)/(2p), if s - a < 1/2 
n^-*^ log(log n), ifs — o=l/2 

n~\ if s - a > 1/2. 



3 Upper risk bound for the adaptive estimator 

The fully adaptive estimator 1^ of £((j)) relies on the choice of a random dimension parameter 
m which does not involve any knowledge about the classes J-J^ and Q!^. The main result of this 
paper consists in an upper bound for the maximal risk TZ^[ij^; TJ^jG!^] given by the following 
theorem. We present the main arguments of its proof in this section whereas the more technical 
aspects are deferred to the appendix. We close this section by illustrating and discussing the 
result. 

Theorem 3.1. Assume an i.i.d. sample of {Y,X) of size n obeying (1.1) and let the joint 
distribution of the random function X and the error e be normal. Consider sequences (3 and 7 
satisfying Assumption 2.1. Define := arg min^^j^ 7^^[(1 + \ogn)n^^\F'p,Q^] and suppose 
that 7~o Mm* = o{n{l + logn)~^) as n —)• 00 then there exists a constant C > depending 
on the classes TJ^ and Q!^, the linear functional £, and cr^ only such that 

7^^[4; ^ C • ni[{l + logn)n-i; J-^, g% for alln^l. 

Remark 3.1. The last assertion states that the data-driven estimator can attain the minimax- 
rates up to a logarithmic factor for a variety of classes J-J^ and 0!^^. In this sense the estimator 
adapts to both the slope function and the covariance operator. This result is derived under 
the additional condition, 7"* Mm* = o(n(l -|- logn)~^) as n — )• 00, which naturally holds 
true in the illustrations. □ 
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We begin our reasoning by giving a preparatory lemma which constitutes a central step in 
the following arguments. 

Lemma 3.2. Let {(j)k)k^i an arbitrary sequence in M and b := (bm)m>i the sequence of 
approximation errors hm = ^'^Pm^klK'Pk — associated with ^{cj)). Consider an arbitrary 
sequence of penalties P := {Pm)m^i, o,n upper bound M G N, and the sequence k = {K.rn)m^i 
of contrasts given by := max^^fc^M |Kfc — — Pfc|- If the subsequence (Pi, . . . ,Pa/) is 
non- decreasing, then we have for the selected model fh := arg min;^^^^^,/ {k^ + p^} and for 
alll^m^^M that 

|4-^('/>)l' ^ 7P^+78b^+42 max ("14 - £(<^fc)|2 _ i p^^ (3.I) 
where (a)+ = max(a,0). 

Proof of Lemma 3.2. Since (Pi, . . . ,Pm) is non-decreasing it is easily verified that 
K^^6 max (\ik-i{(t)k)\'^ -\vk] +12b^, Vl^m^M, 

m^ki^M \ / ^ 

where we use that 2bm ^ niaXm^A:^M|^(</'/c — 4>m)\- The last estimate implies the inequality 
|L-^(<^)|' < ^Pm+2b^+2 max ("14 - £(</>fc)|2 - 1 p^) ,Vl^m^M. (3.2) 

3 m^k^M \ 0/4. 

On the other hand, taking the definition of fh into account, it is straightforward to see that 

Km ~ ^ S-j^l-^m ~ -^min(m,m))l + Kmin(m,'m) ~ + \^m ~ •^(0)1 j" 

^ 3|K^ + P^ + K^ + P^+|4n ^ 6{k^ + P^} + 3|L -^((A)!^- 

From the last estimates and (3.2) we obtain the assertion (3.1), which completes the proof. □ 

The proof of Theorem 3.1 requires in addition to the previous lemma two technical propo- 
sitions which we state now. For n ^ 1 and a positive sequence a := {am)m^i let us introduce 
Mi := max{l m ^ [n^/^J : [i]ln[i]m ^ n} and 

Mn{a) := min |2 ^ m ^ : • [^L^^ > n(l + logn)"i} - 1 

where we set Mn{a) := if the set is empty. Observe that M„ given in (2.2) satisfies 
Mn = Mn{a) with a = {\\\^\^\\s)m'^i- Consider for ?n ^ 1 

al := 2Ey2 + 2{gf^[T]';^\g]rn^ Vm := max [i]i[T]^'[i]k 

and define the penalty term 

P„ := 100a^y^(l + logn)n-\ 

which are obviously the theoretical counterparts of the random objects used in the definition 
of fh. The proof of the next assertion is deferred to the appendix. 
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Proposition 3.3. Let the conditions of Theorem 3.1 hold true and denote by (pm S Mm the 
Galerkin solution of g = Fcp. Define := Mn{a) with a = ([4(i7j]~^)j>i then there is a 
constant C{d) > depending on d only such that for all n ^ 1 



sup sup E<j max ( |^m — ^(<Am)P — 



<^(.^+r)max<'E7.)^E 



n ^ ' I ■" ^ B- 

Additionally, let us introduce for n ^ 1 the random integer M~ := M„(a) with the 
sequence a = {16d^jJ'^)j^i. In the following we decompose the risk with respect to an event 

and respectively its complement on which p and M„ are comparable to their theoretical 
counterparts. To be more precise, we define the event 

£n := {V 1 ^ m ^ M+ : ^ p^ ^ 24 P„} n {m" ^ M„ ^ M+} 

and consider the elementary identity 

sup supE|4-^('A)l'= sup snpE{\£m-£{cP)\He„) 

+ sup supE(|4-^('/')|'l£^). (3.3) 

The next proposition states that the second right hand side term is bounded up to a constant 
by and is hence negligible. The proof is deferred to the appendix. 

Proposition 3.4. Let the conditions of Theorem 3.1 hold true. If we consider the fully data- 
driven choice in given in (2.3) then there exists a constant C{d) > depending on d only such 
that for all 1 

sup sup E(|4 - l^c ) ^ ^ + r) max <J V 7j, E ^ 

We are now in position to prove Theorem 3.1. 

Proof of Theorem 3.1. In the following we will denote by C{d) > a constant depend- 
ing on d only, which may change from line to line. From the elementary identity (3.3) and 
Proposition 3.4 we derive for all n ^ 1 

sup sup E|4 - ^(</')|' < sup sup E(|4 - ) 
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We observe that the random subsequence {af , . . . , ) , and hence (pi , ■ ■ ■ , pj^ ) , are by con- 
struction non-decreasing. Furthermore, we observe that for ah 1 ^ ?n ^ A; ^ M„ the identity 
{f($k - 4>m),{(f>k - (f>m))m = [?]fc[f]fcH?]fc - [5]m[f]mM5]m holds true. Therefore, it follows 
by using that T is positive definite that [?]m[r]m"^[?]m ^ [?]i-[r]^^[?]fc, and hence ^ a^. 
Consequently, Lemma 3.2 is applicable for all 1 ^ m ^ M„ and we obtain 

|4-^('/>)l' ^7p„+78b^+42 max^ (|4 _ £(^,,)|2 _ i 



m<k<M„ 
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On the event £n we deduce from the last bound that for all 1 ^ ?n ^ M„ 
^ 504p^+78bi+42 max - £(0^)|2 _ i p. 



Taking Lemma B.2 (v) in the appendix into account it follows for all n ^ 1 

sup sup E(|4 -£(</.) pl^ J ^C(d)(a2+r) min [(1 + log n)n-i; J^, g^] 



+ sup sup E-| max ( \tm — ^((^m)P ~ P 



O 



Moreover, Proposition 3.3 and (3.4) imply for all n ^ 1 that 



sup sup E|4 - m? ^ C{d){a^ + r) max < V 7j, ^ 

■ min 7^^[(l+logn)n-^J^,g^] (3.5) 

where we use that [(1 + log n)n ^ n ^ for all m ^ 1. Under the additional 

condition 7~o Mm* = o(n(l + logn)~^) it is easily verified that there exists an integer Uq 
only depending on the sequences /3, 7 and [£] such that for all n ^ rio we have ^ M~ and 



min 7^^[(l + log n)n"^ g^] =7^f[(l + log n)n^Sj-^,g^]. 
However, in case n < no we employ that 



m2 m2 

7^f [(1 + log n)n-i; g^] ^ max(l, (1 + log n)n-') ^ iJl ^ ^ 



and consequently we derive the bound 



min 7?,^[(1 + log?i)n ^J«,^f']<n ^no^S^ for ah n < tIq. 

The combination of both cases yields for all n ^ 1 

min 7^1,[(l + logn)n-^J^,g^] <noV^7ef[(l + logn)n-i;J-^,g^]. 

KmsgM- Pi 

As rio depends only on the sequences /3, 7 and [£], we derive the result of the theorem from 
the previous display together with (3.5), which completes the proof. □ 
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Remark 3.2. Recall that the estimator with optimally chosen dimension parameter m* 
is minimax-optimal, i.e, its maximal risk TZ^[£m^; TJ^,Q!^] can be bounded up to a constant by 
the lower bound Tll[n~^; TJ^,Q!^]. However, due to Theorem 3.1 the maximal risk of the fully 
adaptive estimator is bounded by a multiple of 7^*[(1 + logn)n~^; TJj,Q'^]. The appearance of 
the logarithmic factor within the rate is a known fact in the context of local estimation. It 
is widely considered as an acceptable price for adaptation (in the context of non-parametric 
Gaussian regression it is unavoidable as shown in Brown and Low [1996]). □ 



Illustration continued. In the configurations defined below Assumption 2.1 the additional 
condition 7~« [^]^» Mm* = o(n(l + logn)""*^) as n — )• oo is easily verified. Therefore, the maxi- 
mal risk of the fully adaptive estimator is bounded by a multiple of TZl [(1 + log n)n~^; TJ^,G^] 
due to Theorem 3.1. In the next assertion we state its order in the considered cases and we 
omit the straightforward calculations. 

Proposition 3.5. Assume an i.i.d. sample of (Y,X) of size n obeying (1.1) and let the joint 
distribution of the random function X and the error e be normal. The obtainable rate of 
convergence is determined by the orders of 7?.^[(1 + log n)n~^; J-"^, as given below. 



(pp) If p > 0, a > 1/2, p + a ^ 3/2 and s > 1/2-p, then 

[n lognj 

n-\\ogn)\ ifs-a = l/2 

n~^logn, ifs — a > 1/2. 

(pe) If p > 0, o > 0, and if s > 1/2 — p, then 

7^^[(l +logn)n-i; x (log n)-(2p+2«-i)/(2«) . 

(ep) If p > 0, a > 1/2 and s G M then 

In-i(logn)(2p+2«-2«+i)/(2p), ifs-a<l/2 

n-\\ogn){\og\ogn), if s - a = 1/2 

n^^logn, ifs — a > 1/2. 

We shall briefly compare these rates with the corresponding minimax optimal rates derived 
in Section 2.2 above. Surprisingly they coincide in case (pe), and hence the fully data-driven 
estimator is minimax-optimal. The rates given in case (pp) coincides with the ones that have 
been obtained by Goldenshluger and Pereverzev [2000] for an a priori known operator. In 
comparison to the minimax optimal rates the cases (pp) and (ep) feature a deterioration of 
logarithmic order as expected (compare Remark 3.2). 
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4 Examples: point-wise and local average estimation 

Consider IHI = L^[0, 1] with its usual norm and inner product and the trigonometric basis 

V'l := 1, i^2j{s) := \/2cos(27rjs), ^/'2j+i(s) := \/2 sin(27rjs), s G [0, 1], j G N. 

Recall the typical choices of the sequences /3 and 7 as introduced in the illustrations above. 
If /3j X for a positive integer p, see cases (pp) and (pe), then the subset := {h G 
EI : < 00} coincides with the Sobolev space of p-times differential periodic functions (c.f. 
Neubauer [1988a,b]). In the case (ep) it is well-known that for p > 1 every element of J-"^ is 
an analytic function (c.f. Kawata [1972]). Furthermore we consider a polynomial decay of 7 
with a > 1/2 in the cases (pp) and (ep). Easy calculus shows that the covariance operator 
r G acts for integer a like integrating (2a)-times and is hence called finitely smoothing (c.f. 
Natterer [1984]). In the case (pe) we assume an exponential decay of 7 and it is easily seen that 
the range of F G is a subset of C°°[0, 1], therefore the operator is called infinitely smoothing 
(c.f. Mair [1994]). 

Point-wise estimation. By evaluation in a given point to G [0, 1] we mean the linear func- 
tional £to mapping h to /i(to) := ^io(^) = Sj^i[^]jV'j(*o)- In the following we shall assume 
that the point evaluation is well-defined on the set of slope parameters J-'p which is obviously 
implied by X]^i[^to]j/3j~^ < 00. Consequently, the condition Ylj^if^J^ < 00 is sufficient to 
guarantee that the point evaluation is well-defined on J-"^. Obviously, in case (ep) or in other 
words for exponentially increasing f3, this additional condition is automatically satisfied. How- 
ever, a polynomial increase, as in the cases (pp) and (pe), requires the assumption p > 1/2. 
Roughly speaking, this means that the slope parameter has at least to be continuous. In order 
to estimate the value (/>(to) we consider the plug-in estimator 

^ f [^to]ln[^]m[9]rn, if [^]rn is nou-siugular and ||[r]^i||, n, 
*° 1 0, otherwise, 

with [£to]ni = (V'i(*o), • • • ,V'm(to))*- Moreover, we observe that £^ = dto&m) = 4>m{to)- 

Minimax optimal point-wise estimation. The estimator's maximal mean squared error over the 
classes 7"^ and G!^ is uniformly bounded for all to G [0, 1] up to a constant by 7^**° [n~^; T^,G!^], 
i.e., sup^gjrr suppggd E|(/)m* (to) — 0(io)|^ ^ C*?^**" [n~^; J^^, ^^] for some C > 0, which is the 
minimax-optimal rate of convergence (c.f. Johannes and Schenk [2010]). 

Illustration continued. We derive with [£to]'j ^ and s = in the considered cases : 
(pp) Up> 1/2, a > 1/2 andp + a ^ 3/2, then 7^^ [n"!; J-^, g^] x n-(2p-i)/(2p+2a) . 
(pe) Up> 1/2 and a > 0, then 7^^' [n"!; J-^, g^] x (log n)-(2p-i)/2'^. 
(ep) Up>0 and a > 1/2, then ni'°[n~^;TLg^] x ^-^(log n)(2"+i)/2p. 
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Adaptive point-wise estimation. We select the dimension parameter in by minimizing the 
penalized contrast function over the collection of admissible values. The obtainable rate for 
the fully data-driven estimator 4'fh{to) in the three considered cases is given as follows: 

(pp) lip> l/2,a> 1/2 and p+a ^ 3/2, then 7^^[(l+logn)n-l;J•^,g^] x (n-Mogn)(2p-i)/(2p+2a)_ 

(pe) lip> 1/2 and a > 0, then [(1 + log n)n-i; J-^, g^] x (logn)-(2p-i)/{2a)_ 

(ep) If p > and a > 1/2, then 7^!*o [(1 + logn)n-i; J"^, g^] x „-i(iogn)(2p+2»+i)/(2p). 

The proposed fully data-driven point wise estimator is minimax optimal in case (pe) which is 
easily seen by comparing the rates of the adaptive estimator with the corresponding minimax 
rate. In the other cases, the rates deviate only by logarithmic factor, as expected. 

Point-wise estimation of derivatives. It is interesting to note that by slightly adapting 
the previously presented procedure we are able to estimate the value of the g-th derivative of 
(j) at io- Given the exponential basis, which is linked to the trigonometric basis for G Z and 
t G [0, 1] by the relation exp{2i7rkt) = 2^^'^{ip2k{t) + ^V'2A:+i(i)) '^ith = 1- We recall that for 
^ q < p the q-th. derivative (f)^^^ of in a weak sense satisfies 



(tg) = y'(2i7rA;)'' exp(2i7r/cto) ( / (t>{u) exp{2i7rku)du 



Given a dimension m ^ 1, we denote now by [Fj^ the (2m -|- 1) x (2m + 1) matrix with generic 
elements (-0-,-, r'0fc)e, —m ^ j,k ^ m and by [g]m the (2m -|- 1) vector with elements {g,ipj)-' 



-m ^ j ^ m. Furthermore, we define for integer q the (2m + 1) vector [^1^^]™ with elements 



@(io) 




[^to]j '■= {2iiTjy exjp{2i7rjto), —m ^ j ^ m. In the following we shall assume that the point 
evaluation of the q-th derivative is well-defined on the set of slope parameters J-"^ which is 
implied by I^J^) < since |[4o^]iP ^ i^'^- Obviously, this additional condition is 

automatically satisfied in case (ep) and requires the assumption g<p— l/2in the cases (pp) 
and (pe). We consider the estimator of ^(''^(to) = given by 

jf if [r]m is non-singular and ||[f]m^||s ^ n, 

otherwise. 

Minimax optimal point-wise estimation of derivatives. The estimator (hl^l (to) with appro- 
priately chosen dimension is minimax optimal, i.e., sup^gjrr suppggd (to) — <^^'^H*o)P ^ 

CTZ^° [n~^;F'''p^Q^] for some C > 0, where TZ^'-^ [n~^; J^^,^^] is the minimax-optimal rate of 
convergence (c.f. Johannes and Schenk [2010]). 

Illustration continued. In the considered cases we derive with s = —q 

(pp) Up > 1/2, a > 1/2 andj) + a ^ 3/2, then nj° [n~^;T^p,g^] x n-(2p-2g-i)/(2p+2a)_ 
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(pe) lip> 1/2 and a > 0, then 7^**° [n~^]Fl,g^] x (log n)-(2p-2g-i)/(2a) ^ 

^(9) 

("ep; If p > and a > 1/2, then 7^**" [n'^; J"^,^^] x n-i(log . 

Adaptive point-wise estimation of derivatives. In the three considered cases the obtainable rate 
of the fully data-driven estimator <^^^(io) is given as follows: 

(pp) If p > 1/2, a > 1/2 and p + a ^ 3/2, then 

7^5 [(l + logn)?i-i; X (n-ilogn)(2p-2'/-i)/(2p+2a)_ 

(pej If p > 1/2 and a > 0, then 

7^**» [(l + logn)n-i; x (log n)-(2p-29-i)/2a, 

("ep/ If p > and a > 1/2, then 

7^,*« [(1 +logn)n-i; x ^-^(log n)(2P+2a+2g+l)/2p_ 

Also in the situation of adaptively estimating the ((?)-th derivative in a given point the obtained 
rates deteriorate by a logarithmic factor in the cases (pp) and (pe) only. 

Local average estimation. Next we are interested in the average value of (p on the interval 
[0, b] for b € (0, 1]. If we denote the linear functional mapping h to /q h{t)dt by £^ then it 
is easily seen that [l^ = 1, = (V27rj6)-i sin(27rj5), = (\/27rj6)-i cos(27rj6) for 

j ^ 1. In this situation the plug-in estimator ^ = b~^ <Pm{t)dt is written as 

^ ^ f [^1m[r]mM5]m, if ^]rn is non-singular and ||[f]^i||^ ^ n, 
1 0, otherwise. 

Minimax optimal estimation of local averages. The estimator r^, attains the minimax optimal 
rate, i.e., sup^gj-r supregdE] 4>m;^{t)dt - (t){t)dt\'^ ^ C7^f [n-^; J"^, t?^] for C > 0. 

Illustration continued. In the three cases the order of TZf [n~^; TJ^,Q!!i] is given as follows: 
(pp) Ifp^O, a> 1/2 andp + a>3/2, then 7^f [n"!; J-^, g^] ^ n-(2p+i)/(2p+2a). 
(pe; If p ^ and a > 0, then 7^f [n"!; J"^, g^] x (log n)-(2p+i)/2a. 
(ep) If p > and a > 1/2, then nf[n-^;TJ^,g^] x ^-^(log n)(2»-i)/2p. 

Adaptive estimation of local averages. In the three considered cases the obtainable rate of the 
adaptive estimator is given below: 

(pp) Up ^ 0, a > l/2andp-ha > 3/2, then7^f [(l-Mogn)n-i; J"^,^^] x (n"! logn)(2p+i)/(2p+2a). 

(pe) If p ^ and a > 0, then TZf[{l + logn)n~^;TJj,g^] x (log n)-(2p+i)/2a_ 

(ep) If p > and a > 1/2, then 7^f [(1 + logn)n-i; J-^,^^] x n-i(logn)(2p+2»-i)/2p. 

In this setting again, we notice a deterioration of logarithmic order in the cases (pp) and (pe) 
only. 
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Appendix 



This section gathers prehminary technical results and the proofs of Proposition 3.3 and 3.4. 
A Notations 

We begin by defining and recalling the notations which are used in the proofs. Given an integer 
m ^ 1, M-m denotes the subspace of H spanned by the functions {V'l, . . . ,ipm}- and 
denote the orthogonal projections on Hm and its orthogonal complement M.:^ respectively. If 
K is an operator mapping M into itself and we restrict HmKUm to an operator from Mm into 
itself, then it can be represented by the matrix [K]m. Furthermore, [Vt;]rn and [I]^ denote 
the m-dimensional diagonal matrix with diagonal entries {vj)i<^j^„i and the identity matrix 
respectively. With a slight abuse of notations ||f || denotes the euclidean norm of the vector v. 
In particular, for aU / E Mrr^ we have ||/||^ = [f]ln[Vv]m[f]rn = \\[V v]U^ [f]mf ■ Moreover, we 
use the notations 

Vm = max [i]i[f]-\£]k, Vm = max [it[T]l\l]k, = Wm[V&]rn. 

— — l^fe^m — — — — 

Recah that [f]„ = i ELiI^dml^i]^ and [5]™ = ^ E7=i y^[X^]rn where [F]^ = E[X]^[X]^ 
and [g]m = ^Y[X]ra- Given a Galerkin solution (pm G Hm, let Um := Y — ((/>m) = o-e+{4> — 
0™,X)h. We introduce := Ef/^ = + (r(0- c/)^), (0- </.„,))h, := EY^ = a^ + {T^,^)M 
and (7^ = 2((Ty + blmirlm^blm) where we use that e and X are uncorrelated. With these 
notations we have 

= 100f7^K^(l + log n)n~\ p^ = 700aiVm{l + logn)n~\ 
Let us define the random matrix [Ejm and random vector [VF]m,, respectively, by 

[^]rn := [r]„l/2 [r]rn [r]^^/2 - [I]^, and [W]rn := [g]nr " [r]rn[<Pm]rn, 

where E[H]m_ = 0, because E[r]m = [rjm, and E[iy]m_ = [r(0 — (t)m)]m = 0. Furthermore, we 
introduce ay '■= Y17=i ^'^d the events 

^m,n := {WVWs ^ n}, Um,n ■= {8 ||, ^ 1}, 

An := {1/2 ^ al/al ^ 3/2}, Bn := {||[%||. 1/8, VI «^ A: ^ A<}, 

Cn := {[H^]i[r],i[w^u ^ ^([4[r],ibk + 4),vi ^ k ^ m^}, (a.i) 

along with their respective complements ilm,n> ^rn,m -^n! and C^. Here and subsequently, 
we will denote by C a universal numerical constant and by C(-) a constant depending only on 
the arguments. In both cases, the values of the constants may change with every appearance. 
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B Preliminary results 

The proof of the next lemma can be found in Johannes and Schenk [2010]. It rehes on the 
properties of the sequences /?, 7 and [i] given in Assumption 2.1. 

Lemma B.l. Let T belong to where the sequence 7 satisfies Assumption 2.1, then we have 

sup|7™||[T]-i||| ^4^3, (B.l) 

sup||[V,]^2[TLMv,]^2|K4ci3, (B.2) 

sup||[V^]^^/2[TUv^]„i/2|| <: d. (B.3) 

meN 

Consider in addition (j) € /"^ with sequence /3 satisfying Assumption 2.1. If (pm denotes a 
Galerkin solution of g = T(p then for any strictly positive sequence w := {wj)j^i such that w/P 
is non-increasing we obtain for all m G N 



|2 ^ OA JS„ '^rn 



-'mUw 









max < 








l7|/ 



,^,.„ (B.4) 

UmWl^Ud^r, \\T^/\^-^m)\\l^34d\-fm(3-\ (B.5) 
Furthermore, under Assumption 2.1 we have 

\i{cp - <A.)p ^ 2r I 5] M + 2(1 + d')^ f; S|. (B.6) 

Lemma B.2. Let Assumption 2.1 be satisfied and define D := (4^^). For T ^ Q!^ we have 

(i) ^ V^/VZ ^ D, ^ 7m||[rL^lls ^ D and d~^ ^ 7^ maxi^fc^„,|| [L]^^ ||, ^ D for 
all m ^ 1, 

(a) V?" + ^ n4D(l + logn)^^ and hence Vj^,j+ ^ n4L'^(l + logn)"^ for all n 1, 
(ill) 2max,^„^^,+ ||[r]„i|| ^nifn-^2D and || [£]^||2(1 + log n) ^ 8Z)2. 
If (j) belongs in addition to J-J^ then it holds for all m ^ 1 

(iv) pl.^al,^ 2((t2 + 35d^r) and 

(v) sup<^e^.supreg.{P„^ + b™,} 2mD^a^ + r)nl{{l + \ogn)n-^-Tl,G'^^). 

Proof of Lemma B.2. Due to (B.2) - (B.3) in Lemma B.l, we have Vm < ^d^[l^]lJS/ ■y]^\l]m 
= DVZ and VZ ^ c^MmlrlmM^lm ^ dV„,. Moreover, from (B.l) and (B.2) it follows that 
||[r]^i||, ^ 4^37-1 and 7-1 ^ d\\[T]-^\\s. Thus, for all m ^ 1 we have D ^ ||[r]^i||,7^ ^ d-\ 
Hence, the monotonicity of 7 implies d~^ ^ 7Af maxi^msjM || [r]^"^ ||s ^ D. From these esti- 
mates we obtain (i). 

Proof of (ii). Observe that VJ ^ ^ II MAf+ 11^77^/+ • case M+ = 1 the assertion is triv- 
ial, since = 71 due to Assumption 2.1. Thus, consider ^ > 1, which implies 
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mm,< .<,,+ {7j||[^]^+||"^} ^ (1 +logn)/{4:Dn), and hence V"' . ^ 4L>n(l +log7T,)~^. More- 
over, from (i) follows ^ ^ 4Z)^n(l +logn)^^, which proves (ii). 

Proof of (hi). By employing that D^~\ ^ max,^ ^,,+ ||[r]~-^||, the assertion (iii) follows in 
case = 1 from 71 = 1, while in case M+ > 1, we use || [■^]jvf+ IP/7m+ ^ 4D?i/(l + logn). 
Proof of (iv). Since e and X are centered it follows from [(/>m]m = [r]m^[(^]m that ^ 
2(Ey2 +e|((/>„,,X)hP) = 2(0-^ + [g]ln[^]^[9]rn) = c^li- Moreover, by employing successively 
the inequality of Heinz [1951], i.e. ||r-^/^(^|p ^ and Assumption 2.1, i.e., 7 and /3~^ are 

non-increasing, the identity dy = o"^ + (ri;^, implies 

^ a"^ + dUf^ ^ a"^ + dr. (B.7) 

Furthermore, (B.3) and (B.4) in Lemma B.l imply 

[9]i[r]fcM9k^d||0fc||^^34dV (B.8) 

The assertion (iv) follows now by combination of the estimates (B.7) and (B.8). 

Proof of (v). From Vm ^ DVm due to assertion (i) and the second inequality in (iv) we derive 

m 

Vm ^ 100(7^(1 + \ogn)n-^DVZ ^ 2m{a'^ + r)D^{l + log n)n-^ Zl^hj"^- (^-9) 

i=i 

Furthermore, by using (B.6) in Lemma B.l we obtain that 

m 

hm ^ 16dV{max(J][£]2/5-\7^/3-i^[^]27-i)}. (B.IO) 

Combining the bounds (B.9) and (B.IO) implies assertion (v), which completes the proof. □ 
Lemma B.3. For all n, m ^ 1 we have 

Proof of Lemma B.3. Let = ||[r]m"'^||r"'^ and recah that 1 ^ M„ ^ with 



\ < ^ 4,V1 ^ m ^ M/, \ C \ ^ M„ ^ A/^ 



Im s 



II M Will' < ^^^-1' 



min ^^>i±Mlij, M = Mi. 



Given := \\\^]m\\s we have D ^ ^ Tm/^rn ^ d for all m ^ 1 due to (i) in Lemma B.2 
which we use to proof the following two assertions 

{Un < M-] C I min : ^ < U, (B.ll) 

|m„>M+|c| max — ^ a\ . (B.12) 
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Obviously, the assertion of the Lemma follows now by combination of (B.ll) and (B.12). 
Consider (B.ll) which is trivial in case M~ = 1. For M~ > 1 we have min nJv>'\\'> ^ 

^ ' n n i^m^M- 

4D(i+iogn) hence min nrl^f ^ 4(i+iogn) ^ g exploiting the last estimate we obtain 



{M„<M^}n{M„<M-}= y {m„ = m} 

M-- 

c U 



M=l 

M--1 



TM+l 1 + log 11 . Tm 1 + log 

< 1/4 



mm 




< - 










min 











while trivially |m„ = M^j n |m„ < M^j = which proves (B.ll) because M" ^ M^. 
Consider (B.12) which is trivial in case M+ = M^. If M+ < M^, then < 
and hence 

{m„ > i} n {m„ > M+} = IJ {m„ = m} 

M=A/++1 

M=M++1 



Tm ^ 1 + log 11 . Tjn ^ 1 + log n 

24ma/ " n J " [2^m™St+l) II [CP " ^ 



while |M„ = l|n|M„> M+| = which shows (B.12) and completes the proof. □ 

Lemma B.4. Let An, Bn and Cn as in (A.l). For all n ^ 1 it holds true that 

AnHBnn Cn C {Pfc ^ Pfc ^ 24 Pfc , 1 ^ /c ^ M^} H {M" ^ M„ ^ M+}. 

Proof of Lemma B.4. Let k ^ 1. If \\[E]k\\s ^ 1/8, i.e., on the event Bn, it is easily 

verified that ||([I]fc + [H]^)"^ — [I]j.||s ^ 1/7 which we exploit to conclude 

(6/7)||[r],i||,^||[f],i||,^(8/7)||[r],i||, and 

(6/7)s*[r]^^s ^ s^[f]^^s ^ (8/7)s*[r]^^s, for ah s G M^', (B.13) 

and, consequently 

(6/7)[?]i[r]^n?]^ ^ [dU^Viah ^ (8/7)[?]|[r], n?]A:. (b.i4) 

Moreover, from ^ 1/8 we obtain after some algebra, 

QQ 
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Combining each of these estimates with (B.14) yields 

{15/16)[g]i[r]l'[g], ^ 4[W^],[r]^nW^]i + (7/3) H?]^, 

(7/8)[?]i[f ], ^ (33/16) biuri^H^]. + mur]i'mk. 

If in addition [VF]^[r]^^[l^]fc ^ |(b]*fc[r]^n5]fc + '^y)' on the event C„, then the last two 
estimates imply respectively 

(7/16)(b]i[r]^Mff]i + 4) ^ (15/16)4 + {7/3)[g]i[f]^'[g],, 
iVm]U^]k_'[9]k ^ (41/16)[4[r]^n5k+ (1/2)4, 

and hence in case 1/2 ^ o'y/cty ^ 3/2, i.e., on the event An, we obtain 

{7/16){[g]i[r],'[g], + a'y) ^ (15/8)4 + (7/3)[?]i[r]^n?k, 
(7/8)([?]|[r],i[5k + 4) ^ (41/16)[4[r],i[g], + (29/16)4- 

Combining the last two estimates yields 

li2[g]i[r],'[g]k + 2a^Y) < (2[?]l[r],i[g]fe + 24) 3(2[5]|[r], + 24). 

Since the last estimate and (B.13) hold for all 1 ^ A: ^ on the event AnC^BnC^Cn it follows 

^„ n i3„ n C„ C ^ 4 ^ 34 and (6/7) ^ ^ (8/7) Kn, VI ^ m < 

The definitions of P„ = 1004Kn(l + log n)n ^ and p^ = 7004Hn(l + log?i)n ^ imply 

AnHBnnCnC {p^ < p^ ^ 24 P„„ VI «^ m } . (B.15) 

On the other hand, by exploiting successively (B.13) and Lemma B.3 we obtain 

AnHBnnCnci^^^ If^T ^ 7' ^ ^ ^'^"j ^ {^^" ^ M„ ^ M + } . (B.16) 

From (B.15) and (B.16) follows the assertion of the lemma, which completes the proof. □ 

Lemma B.5. For all m,n^ 1 with n ^ (8/7)||[r]~"'^||5 we have Um.n C ^m,n- 

Proof of Lemma B.5. Taking the identity [T]rn = [r]m^{[I]m + [^]rn}[r]m^ into account, 
we observe that 1/8 implies m^\\s ^ 8^ll[r]^^ll. ^ (8/7)||[r]^i||, due 

to the usual Neumann series argument. If n ^ (8/7)||[r]~-^||s, then the last assertion implies 
^m,n C ^m,m which proves the lemma. □ 
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C Preliminary results due to the normality assumption 

We will suppose throughout this section that the conditions of Theorem 3.1 and in particular 
Assumption 2.1 are satisfied, thus, the technical Lemmas stated in Section B are applicable. 
We show technical assertions under the assumption of normality (Lemmas C.l- C.4) which are 
used below to prove Propositions 3.3 and 3.4. 

We begin by recalling elementary properties due to the assumption that X and e are 
jointly normally distributed, which are frequently used in the following proofs. For any /i E EI 
the random variable {h,X)^ is normally distributed with mean zero and variance (F/i, /i)e. 
Consider the Galerkin solution (p^a and h E then the random variables {(j) — </)m,X)e and 
(/i,X)H are independent. Thereby, Um = Y - {(j)m,X)u = ae + {(j) - (j)m,X)M and [X]rn 
are independent, normally distributed with mean zero, and, respectively, variance p'^ and 
covariance matrix [T]m- Consequently, {p^Um, [^]m[^]in^'^) is a (m + l)-dimensional vector 
of i.i.d. standard normally distributed random variables. Let us further state elementary 
inequalities for Gaussian random variables. 

Lemma C.l. Let {Ui,Vij,l ^ i ^ n,l ^ j ^ m} be independent and standard normally 
distributed random variables. We have for all r] > and ( ^ 4m /n 

P{\n-y^ tiUf - 1)1 ^ , 2exp ( - ^^^^^) ; (CI) 



(C.3) 




j=i i=i 

and for all c > and oi, . . . , am ^ that 

-(-^|:-/-)^^exp(-^); 



(C.4) 



+ 1 ^ ^=^=^^ + 32cexpf-^Y (C.5) 

V / eV^c(i + iogn) V ley 



m 



\i=i i=i ) i=i 

Proof of Lemma C.l. Define W := YJLiUf and Zj := (EHi )"^^^ Er=i ^^i^u- Obvi- 
ously, W has a distribution with n degrees of freedom and Zi, . . . , Zm given Ui, . . . ,Un are 
independent and standard normally distributed, which we use below without further reference. 
The estimate (C.l) is given in Dahlhaus and Polonik [2006] (Proposition A.l) and by using 
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(C.l) we have 



exp 



n \ 2 

H , ^ exp 



ley vW 



n 



2 

?7 n 



which imphes (C.2). The estimate (C.3) follows analogously and we omit the details. By using 
(C.l) we obtain (C.4) as follows 

/ n \ POO " 

Mn-'Y.U!-2 \ = P{n-^/'Y.^U!-l)^n''\l + t))dt 

n{l + tf 



exp 



8(l + (l + t)) 



dt ^ exp 



n{l+t) 



16 



dt 



n 



16 



n 



exp \ — — ] I exp — —t \ dt = — exp — 777 • 



16 



16 



n 



n 



16 



Consider (C.5). Since n ^1"^ Z^"=i is standard normally distributed, we have 

(n \ /-oo " 

|n-i/2^C/ip-2c(l + logn) = / ^ [7^1 ^ (t + 2c(l + log n))i/2)dt 



^2^(t + 2c(l + logn)) 



exp 



(t + 2c(l + logn)) 



sJ-KciX + logn) Jo 
By using the last bound and (C.4) we get 

E ( |n-i/2 ^ f/.y.^|2 _ 4^(1 ^ log ^) j 



1 



exp — -t 



2 

2e~'^n" 



dt 



Y^7rc(l + logn) 



'iVFE[(|Zi|2 - 2c(l + logn))^|[/i, . . . , ^7„] + 2c(l + logn)(n-iVF - 2) 

2n-'= 



e'^Y^ 7rc(l + log n) 



(1+logn) 
+ 32c exp 



n 



n 
16 



which shows (C.5). Finally, by applying E[Z||C/i, . . . ,Un] = 105 and EW^ = n{n + 2)(n + 
4)(n + 6) we obtain EfW^^Z?] ^ (lln)^ and hence 



m n 



(m \ 4 m 4 m 

Y^a.WZ]] ^ j;a,(E[W-^z8])V4 ^ (iin)4( J] «,) 
i=i ^ j=i i=i 



which shows (C.6) and completes the proof. 



□ 
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n 



Lemma C.2. For all n, m ^ 1 we have 

n^m-%||[S]^[r]V2||8 <: (34E||X||^)^ (C.7) 
n'p-'E\\[WUf^{m\X\g)^- (C.8) 

Furthermore, there exists a numerical constant C such that for all n ^ 1 

max P\ > TTT ^ C*! C.9) 

i«;m^L"V4j \^ p4 ley 

max p(v^\\[E]m\\s>l) ^C; (C.IO) 
n^P({l/2 ^ a^/cj?^ ^ 3/2}") ^ C7; (C.ll) 
n^..pE(""'^li-Fl;'"^-1"-'-8(l + logn)) < C; (C.12) 

„^supE("'M'-Fl"''fl-'^8(l+log„)) <a (C.13) 
Proof of Lemma C.2. Let n,m ^ 1 be fixed and denote by (Aj, ej)i^j^m an eigenvalue 

— 1 /2 

decomposition of Define Ui := (crej + (0 - 4>m, Xi)B_)/ Pm and Fjj := (A^- e*[Xj]m), 

1 ^ i ^ n, 1 ^ j ^ m, where C/i, . . . , Vii, . . . , Vnm are independent and standard normally 
distributed random variables. 

Proof of (C.7). For all 1 ^ j, / ^ m let 6ji = 1 if j = / and zero otherwise. It is easily 
verified that ||[H]„[r]^'||2 ^ Ef=i El^i A/In"! EHiC^ij^i/ " '^iOP- Moreover, for j / / we 
have E| ^ (lln)^ by employing (C.6) in Lemma C.l (take m = 1 and oi = 1), 

while E| Er=i(^ij - 1)1*^ = n^256(105/16 + 595/(2n) + 1827/n2 + 2520/n3) ^ (34n)l From 
these estimates we get by successively employing Jensen's and Minkowski's inequality that 



m"%||[Hyr]V2||8 ^ n-V-^^ (5]AKE| J2{V,,Va - 6,i)f ^ ^"^(34^ A,) 

j=i 1=1 i=i j=i 

The last estimate together with Ejli — ^^i[^]m) ^ tr(r) = E||X||^ implies (C.7). 



Proof of (C.8) and (C.9). Taking the inequality YlJLi^j ^ ^II^IIh and the identities 

2 



n'pjmur = iET=i A.(Er=i u.y.j?r and (miAvj^^iwu/pi = n-^ et=i(^=i ^^^^ 

into account the assertions (C.8) and (C.9) follow, respectively, from (C.6) and (C.3) in Lemma 
C.l (with Oj = Xj). 

Proof of (C.IO). Since n||[H]m,||s ^ ""t- maxi^j | Er=i(^«i^j' ~ obtain due to 
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(C.l) and (C.2) in Lemma C.l for all > the following bound 

n 

{n n ^ 

P{\n-^ V^lV^2\ ^ v/m), P(|n-i/2 - 1)1 ^ n^/^rj/m) I 

i=l 1=1 ) 

1 ntf /m? 



■q/m 



2 f/ m / n ( 2 I 2 I ^\ ( I'' 

^ ?n max |(1 + ^^^^^ ) exp ~ ^ ™™ |^ /"^ 1 l/^l 1 , 2exp ~ g Y 

Moreover, for all r] this can be simplified to 

P(||[S]^||. ^ r^) ^ max |l + 2} exp ( - , 

which obviously implies (C.5). 

Proof of (C.ll). Since Yi/cry, . . . , y^/cry are independent and standard normally dis- 
tributed, (C.ll) follows from (C.l) in Lemma C.l by exploiting that {1/2 ^ dy/cTY ^ 3/2}'' C 

{i^-'Er=i>^.74-ii>i/2}. 

Proof of (C.12). From the identity n{[W]iSr]^[W]rn) / {rnpl,) = ET=ii^"'^^ TJl=i UiV^.f 
the estimate (C.12) follows by using (C.6) in Lemma C.l, that is 



supE = — ^ ^-8(l + logn) |n 2^ C/jyiir - 8(1 + log 



n) 

'+ \ i=l 



^ I , +64 (^ + ^°g"^ xp(-n/16)Ucn-^. 

\eV^2(l + logn) n / ;j 

Proof of (C.13). Define V- := (ML[r]™M^]rn)^'/'ML[r]^M^i]m for 1 ^ i ^ n, where 
Ui, . . . , Un, Vi, . . . ,Vn are independent and standard normally distributed random variables. 
By employing the identity n([^]^[r]^i[W^]^)V(p^[^]^[r]^i[^]^) = \n-'^^ ZtiUiVi\^ the es- 
timate (C.13) follows from (C.6) in Lemma C.l, which completes the proof. □ 

Lemma C.3. There exists a constant C{d) only depending on d such that for all n ^ 1 



(C.14) 



sup sup Y^{m^]-n^HWUf ^ C {d){a' + r)n-\ (C.15) 



Proof of Lemma C.3. The key argument to show (C.14) is the estimate (C.12) in Lemma C.2. 
Taking [^]m[r]m^[^]m ^ and = 8 Vm ^~^^n^ " into account, together with the facts 



24 



that max^^^^j,^+ Vm = V^^+ ^ nC{d){l + \ogn) ^ and =^ o"™ ^ C{d){a'^ + r) for all 4> £ TJ^, 
r G (Lemma B.2 (ii) and (iv)) we obtain 



Art 



m=l 



m=l \ 



^ ^r^n ^M+supE 8 1 + logn 

1 + logn m^i \ mpfj 



The assertion (C.14) follows by employing (C.12) in Lemma C.2 and ^ n. The proof of 
(C.15) follows the same lines by using (C.13) in Lemma C.2 rather than (C.12) and we omit 
the details. □ 

Lemma C.4. There exists a numerical constant C and a constant C{d) only depending on d 
such that for all n ^ 1 



sup 



) sup |n4(M+)^ max P((3^,,„) ]^C; (C.16) 

sup sup in M+ max P{n^^)\i^C{d); (C.17) 
sup sup{n^P(£:^)} ^ C7. (C.18) 

Proof of Lemma C.4. Since M+ ^ [n^/^J and 155^^.„ = [HJ^H^ > 1/8} the assertion 

(C.16) follows from (C.IO) in Lemma C.2. 

Consider (C.17). With Uo := Uoid) := exp{128d^) ^ 8d^ we have ||Mjv/+f (1 + logn) ^ 
128d^ for all n ^ Uq. We distinguish in the following the cases n < Uq and n ^ Uq. First, 
consider 1 ^ n ^ Uq. Obviously, we have M+ max-^^^^j^^+ P(r2^ „) ^ M+ ^ n~^n^J^ ^ 
C{d)n^^ since ^ n-'^/'* with Uq depending on d only. On the other hand, if n ^ Uq 

then Lemma B.2 (iii) implies n ^ 2maXj^^,^^j^,^+||[r]~^||s, and hence I3m,n C f^m.n for all 
1^771^ by using Lemma B.5. From (C.16) we conclude M+max^^^^^^+ P{VL'^ ,^) ^ 
M+ max-^^^^^j-+ P{Wr^ ,^) ^ Cn~^. By combination of the two cases we obtain (C.17). 
It remains to show (C.18). Consider the events An, Bn and C„ defined in (A.l), where Anf^Bn^ 
Cn C 8n due to Lemma B.4. Moreover, we have n^P(^^) ^ C and n'^P{C^) ^ C due to (C.ll) 
and (C.9) in Lemma C.2 (keep in mind that [n^/^J ^ and 2{alr + [g]l[T]^^[g]k) = (^l ^ pi). 
Finally, (C.IO) in Lemma C.2 implies n'^P{Bl) ^ C by using that ^ 1/8,1 ^ 

m ^ M^} C Bn- Combining these estimates yields (C.18), which completes the proof. □ 

D Proof of Proposition 3.3 and 3.4 

In the following proofs we will use the notations introduced in Appendix A and we will exploit 
the technical assertions gathered in Lemma C.l- C.4. 
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Proof of Proposition 3.3. From the identities'-^ (0m) = Mmfj^M^k lfi™,„ — ^(</'m) 
mm + [S]™)-^ - [I]„ = -([I]^ + [H]m)-HH]m, and [f]^ = [r]^'{[I]^ + [H]^}[r]^' follows 



< mLm'iwu' + 2|[^]^^[rL^/2([i]^+ [Hy-MHyr]^V2[vi.y2 ^^^^^ 

By exploiting + [H] 

m ) ['^]m lis ^ 1/7 and ||[r]^^||s lf7,„,„ ^ n we obtain 

Taking this upper bound into account together with (Mm[r]m^[^]m) ^ V"™, we obtain for all 
(/> G J"^ and r G that 

fij^^sup^^ (|?„ - ^(0^)|2 - ^Pm)^| ^ 2 5^ E^iiCirLHw^kP - Y^P-)^ 

2 ^' 



m= 1 



m=l ^ 

+ 2n' E ^(E|l[-k[r]^'ll?)'^'(E||[W^kr)'^'mvJ)'^'+ E l^(<^'n)pP(0:;,,J. 

1 1 

m=l m=l 

We bound the first and second right hand side term with help of (C.14) and (C.15) in Lemma 
C.3, which leads to 

sup sup eJ sup (\£m-KM\^ -IPm) \ ^C{d){a^ + r)n-^ 

M+ 

+ 2n' sup sup J]^(E||[Hyr]V2||8)V4(]E||[H/U||8)V4(p^yc^ ))i/2 

M+ 

+ sup sup V |£((^^)|2p(0^„). 

Taking into account that for all (j) ^ and F G we have uiax.^^^^^j+ Vm = ^ 
nC{d){l + logn)^^ and Pm ^ ^ C{d){a'^ + r) (Lemma B.2 (ii) and (iv)) the estimates (C.7) 
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and (C.8) in Lemma C.2 imply 

sup sup eJ sup (|£;„-^(</,^)|2_ip ^ I ^£M(a2+r) 

+ £i^(^2^^) sup sup(E||X||^)V(M+)2 max (P(Oj;,,J)'/' 

+ sup sup 5^ |^((^^)pP(l^^,J. 

By combining this upper bound, the property E||X||^ ^ '^X^j>i7j the estimate (B.5) 
given in Lemma B.l we obtain 

sup sup E J sup (\im-^{(t>m)? -]:Vrn]\ ^^^{<y^ +r) 

+ £M(^2^^)(^^.)2 supn2(M+)2 max (P(0^,J)^/' 

+ V sup sup nM+ max 

The result of the proposition follows now from the upper bounds (C.16) and (C.17) given in 
Lemma C.4, which completes the proof. □ 

Proof of Proposition 3.4. Taking the estimate ||[r]~^||s Inm^n ^ and the identity im - 
(^{4>m) lnm,u = Mm[r]m^[W^]m into account it easily follows for all m ^ 1 that 

Furthermore, by exploiting ||[^]m|P ^ n for all 1 ^ m ^ we obtain from the last estimate 

Mi 

max \im - l£c ^ 3{n3 V \\[W]rnf le^ +(sup \l{ct>m)\^ + |£((/>)|2) l^c}. 

" m=l 

We recah that for all </) G J"^ and L G we have /o^, ^ C{d){a'^ + r) and (E|| ["W^l^f ) ^2 ^ 
llE||X|||[/9^n~^ (Lemma B.2 and C.2), moreover, the bounds (sup^^^ + |^((/>)P) ^ 

{s^V^n^iUmWl + \ml)Y.,^i^^ ^ C'(d)^E,>i7f (Lemma B.l) and E||X||2, ^ dE,^i7i 
together with the last upper bound imply 



sup sup E(|£^f, -£((/)) pi^c) ^ sup sup E( max \lm - l{4>)r le-) 

^C(d)(a2+r)max|5]7,,E?[ UmI\P{E'^^)\^I'' ^ P(El)\. 

The assertion of Proposition 3.4 follows now by combination of the last estimate and (C.18) 
in Lemma C.4, which completes the proof. □ 
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